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Methodology and Purpose 


Introduction 

Two decades of research have 
documented what most of us 
already know: no in-school 
factor matters more to students' 
educational experiences and 
outcomes than the effectiveness 
of their teacher. 1 As a result, 
the national airspace is 
increasingly crowded with 
proposed reforms and initiatives 
designed to boost teacher 
performance, including new 
evaluation systems and increased 
accountability This activity 
was crystallized by the U.S. 
Department of Education's 2010 
Race to the Top competition, 
which emphasized the 
development of evaluation 
systems that involve measures of 
student growth. 2 
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PACER's goal is to inform state education policy 
discussions using rigorous, objective research. The 
roughly two dozen works cited in this brief include 
newspaper coverage; government documents (e.g., 
federal agency guidance); research syntheses from 
non-partisan sources such as the Consortium for 
Policy Research in Education and the National 
Comprehensive Center for Teacher Quality; original 
research by Research for Action (RFA) scholars; 
and rigorous, peer-reviewed works from some of 
the nation's most respected education researchers. 
Where we cite an organization that has a clear policy 
agenda — for example. The New Teacher Project or 
the National Council on Teacher Quality — we do 
so because the publication in question has had a 
significant impact on the policy landscape. 

It is important to note that to arrive at a measure 
of teachers' effectiveness, researchers must, of 
course, identify a benchmark or standard. Given 
the central role of accountability in K-12 public 
education and the need for common, objective 
measures on which to base comparisons, research 
has focused overwhelmingly on achievement results 
from state assessments. While student achievement 
is the central goal of schools, there are important 
aspects of academic achievement that cannot easily 
be tested and key characteristics of good teaching 
(e.g., supporting colleagues, engaging with parents, 
mentoring students to ensure regular attendance and 
progress toward graduation) may not show up in 
standardized test results. 3 

What follows is a summary of research and emerging 
practices in other states and districts designed to 
provide state lawmakers, legislative staff. State Board 
of Education members. Department of Education 
officials, and other key stakeholders information 
they need to ensure the effectiveness of their reform 
efforts. 
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The Pennsylvania Context 


Teacher evaluation will be a key focus of the fall legislation session. 
Currently in Pennsylvania, teacher evaluation rating forms are based on 
Charlotte Danielson's professional framework, which assesses Planning 
and Preparation, Classroom Environment, Professional Responsibilities, 
and Instruction. A 2011 survey by the Pennsylvania Department of 
Education found that over 99% of all teachers and administrators 
received a satisfactory rating during the 2009-10 school year. 4 

Pennsylvania is already grappling with the challenges that accompany 
teacher evaluation redesign in a number of ways: 

• Pittsburgh Public Schools — one of four districts nationally 
awarded grants by the Bill and Melinda Gates Foundation to design 
comprehensive reform plans — is working with the local union, the 
Pittsburgh Federation of Teachers, to revamp its evaluation policies and 
expand professional opportunities for highly-effective teachers. 

• Three school districts statewide, along with an Intermediate Unit, are 
currently engaged in a Gates Foundation-funded pilot that is working to 
inform development of statewide reforms, including evaluation tools. 

• The Pennsylvania Department of Education will expand this pilot to 
reach 20% of Local Education Agencies (LEAs) during the 2011-12 school 
year, before going to scale in all districts in 2012-13. 5 

• Senate Bill 1087, sponsored by State Senator Jeffrey Piccola, Chair of 
the Senate Education Committee, was passed by the Senate Education 
Committee on June 14, 2011. The legislation states that "for a teacher 
evaluation and rating system to be thorough and effective, it is essential 
that the system include student performance as an element in measuring 
teacher quality." 
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Teacher Evaluation: Questions and Answers 


How do school districts currently evaluate teachers? 

Across Pennsylvania and nationwide, the most widely-used teacher evaluation 
system is classroom observation by an administrator or peer reviewer, though the 
quality, use, and application of observation protocols differ across states, districts, and 
even within schools. 6 While the number of observations per school year varies based on 
a host of factors, including prior evaluations, teacher experience, local staffing issues, 
and the local collective bargaining agreement, evaluation results are generally quite 
uniform. The New Teacher Project's 2009 report. The Widget Effect, reveals that similar 
to Pennsylvania, the vast majority of educators (99% or more) receive satisfactory 
teacher evaluations. 7 



Are classroom observations a strong measure of teacher quality? 

They may be, under specific circumstances. Drawing on approximately 120 
teacher evaluation studies, the nonpartisan National Comprehensive Center 
for Teacher Quality found that "some highly researched protocols have been found 
to link to student achievement, though associations are sometimes modest. Research 
and validity findings are highly dependent on the instrument used, sampling 
procedures, and training of raters." 8 Put another way, effective observation hinges on 
a reviewer who knows good teaching when he or she sees it, rigorous training in the 
use of appropriate evaluation tools, and ongoing professional development to ensure 
successful implementation and consistent use of evaluative practices. 

When effective observation is employed, important correlations may emerge. 
Heneman and colleagues (2006) examined the use of evaluation systems modeled on 
or modified from the Charlotte Danielson framework in three districts nationwide — 
Cincinnati, Ohio; 9 Coventry, Rhode Island; and Washoe County, Nevada (Reno) — plus 
a Los Angeles area charter school; together, these districts employed more than 6,300 
teachers and educated approximately 107,000 students. Over three years, researchers 
documented "positive relationships between teacher evaluation scores and student 
achievement" at all four sites. These results suggest that regular, rigorous observation- 
based evaluation accompanied by focused professional development can have "a 
substantial positive relationship with student achievement and that the instructional 
practices measured by these systems contribute to student learning." 10 
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Q What do years of teaching experience and graduate degrees tell us 
about teacher effectiveness as measured by test scores? 

A The research on the relationship between student achievement and two common 
#» yardsticks for evaluation and compensation — years of teaching experience and 
graduate degrees — is mixed. 

Experience 

Rigorous research over the past decade has consistently provided evidence that novice 
teachers are less effective than their more seasoned counterparts, and that the pace of 
improvement is most rapid early on; key findings demonstrate a leveling off in teacher 
effectiveness after the first few years. 11 Two more recent studies, drawing on established 
statewide longitudinal data systems in North Carolina 12 and Florida, suggest that 
teachers in certain grades continue to show slight improvement even "beyond the first 
few years" 13 — a finding consistent with initial analysis by RFA staff using data from 
Pennsylvania's student-level data system. 

Graduate Degrees 

Findings on the effects of teacher credentials on student achievement do not generally 
provide compelling support for investments in degree attainment. A synthesis of rigorous 
research on this question by the Center for Educator Compensation Reform noted that 
"the preponderance of evidence suggests that teachers who have completed graduate 
degrees are not significantly more effective at increasing student learning than those 
with no more than a bachelor's degree." 14 Clotfelter, Ladd, and Vigdor (2007) go further, 
reporting "small or negative effects of having a graduate degree" on student achievement 
in the elementary grades. 15 However, they also note that master's-level training appears 
to be beneficial to secondary teachers. 16 

A clear take-away from the research is the need to distinguish between postgraduate 
training in general and advanced training in a teacher's content area. It would be useful 
to enhance the research base on the impacts of specific programs and their outcomes 
for students. 


Fewer than 1% of teachers 
in Pennsylvania and nationally // 
are rated unsatisfactory. 
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Q/ Beyond classroom observations, teacher experience, and credentials, 
what other options of measuring teacher effectiveness are being 
explored/implemented in states and districts? 

A Since the beginning of the standards-based reform movement in the 1990s, 
r \ policymakers have looked to assessment systems to measure the level of student 
learning in schools and districts, and to inform accountability systems. However, 
annual state assessment data only gives us a snapshot of student performance, and 
provides limited information on the impact of a school or an individual teacher on 
student learning. To help quantify school and teacher effects on student learning. Dr. 
William Sanders developed value-added modeling (VAM). Simply put, when used to 
evaluate teachers, VAM leverages a student's previous assessment scores to predict their 
performance on future assessments; the difference between the predicted and actual 
scores determines the academic growth of the student. A teacher's impact on student 
learning is determined by looking at the average growth score of the teacher's students, 
and then comparing it to other teachers in the district or to a pre-determined standard. It 
is important to underscore that VAM is not a system or battery of student assessments; 
rather, VAM is the analysis of student achievement data from existing assessment systems 
to derive measures of growth and the influence of schools and teachers on that growth. 17 

Pennsylvania's value-added assessment system — PVAAS — leverages data from the state 
assessment (the PSSA) to provide both analysis of academic growth of groups (cohorts) 
of students over time, and looks ahead to provide predictive information of future 
student growth. 18 To date, PVAAS has been used to provide teachers with feedback on 
instruction, but is not a component of evaluation or pay. 


A value-added measure is a "collection of complex 
statistical techniques that use multiple years of 
students' test score data to estimate the effects of 
individual schools or teachers ." 19 

(McCaffrey, Lockwood, Koretz & Hamilton, 2003) 




Table 1: Value-Added Measures: Limitations and Strengths 


Q How do value-added measures compare 
with classroom observations? 

A Value-added measures are 
#» comparable to classroom 
observations by school administrators 
in indentifying the strongest and 
weakest teachers in a school, though 
VAM generally does a better job at 
predicting future student achievement 
than principal ratings . 20 Investigation 
by Harris and Sass found that 
analyzing multiple years of student 
test scores are more predictive of future 
teacher performance than classroom 
observations, though the reverse is true 
when data is limited to a single school 
year, as may be the case with a novice 
teacher. The authors note that classroom 
observations are likely to draw on a 
"broader set of characteristics" than 
the ability to raise achievement scores 
(e.g., interpersonal skills, relationships 
with parents and colleagues, rapport 
with students) and that this additional 
information can "substantially increase 
predictive power" — a strong argument 
for using multiple measures in 
evaluation decisions . 21 


Table 1 Source: Goe, L. (May 2008). Key Issue: Using Value-Added Models to Identify 
and Support Highly Effective Teachers. Washington, D.C.: National Comprehensive 
Center for Teacher Quality. 


STRENGTHS 

Value-Added Measures 



• Provide a more objective option 
than other teacher effectiveness 
measures 

• Identify evidence about which 
teacher characteristics impact 
student learning 

• Cost less in the long term as 
compared to classroom observations 
and portfolios 

• Focus on student learning instead 
of teacher practice 

• Create opportunities to identify 
and learn from the most highly 
successful teachers 

81 LIMITATIONS 

• Value-Added Measures 


• Cannot be applied to teachers in 
non-tested grades and subjects 

• May be skewed by incomplete 
student data or small sample size 

• Do not isolate individual teacher 
contributions from multiple other 
factors 

• Involve complex methodological 
issues that can compromise teacher 
scores 

• Rigor varies depending upon 
the appropriateness of teacher 
comparison groups and the quality 

of value-added method used 07 


Q What practical and technical issues should policymakers examine 
when considering the use of value-added measures in 
teacher evaluation? 

A There are both practical and technical concerns that may arise with the 

implementation of value-added measures. Some are rather obvious and already 
topics of study in Harrisburg and statewide — such as how to treat the majority of 
teachers in untested subjects and grades (referred to in a recent report by the Center 
for Educator Compensation Reform as The Other 69%) when value-added measures are 
limited to mathematics and language arts in certain tested grades. Other concerns are 
more subtle, such as the possibility that value-added measures could misattribute teacher 
contributions. For example, the work of a social studies educator who emphasizes 
document analysis and open-ended writing may show up in value-added measures for 
an English instructor. Similarly, an analysis of widely used value-added models found 
that factors other than instruction, such as student classroom placement or external 
events, may be partially responsible for driving student learning growth. 22 

Technical issues also merit consideration, including concerns about the precision of such 
measures. For example, a political poll is never a perfect measure of how two (or three) 
candidates are faring in a race; the measures are therefore accompanied by a margin of 
error (e.g., + or - 4%), with larger error bands for smaller samples. As the focus shifts 
from a large group of students at the school level toward the classroom level, there will 
be a need for larger margins of error in value-added scores. 


Which teacher evaluation measures are most appropriate for 
which purposes? 

A Teachers, like students, should not be evaluated by a single measure. There is 

strong consensus among researchers that student performance data can only act as 
a portion of evaluation systems, and recent policy changes in states across the country are 
responsive in consistently requiring multiple measures of teacher effectiveness. Little, et 
al. (2009) argue that "a well-conceived [evaluation] system should combine approaches 
[of teacher evaluation] to gain the most complete understanding of teaching." 23 



Evaluation methods should also be aligned with the purpose of the evaluation. As the 
National Comprehensive Center for Teacher Quality points out, purposes could include not 
only high-stakes decision-making such as tenure, promotion, and hiring / firing, but also 
sharing feedback directly with teachers, providing continuing support, identifying potential 
teacher leaders, and determining professional development needs. Table 2, below, provides an 
overview of evaluation methods and how they may best be used. 

Table 2: Teacher Evaluation Methods by Purpose 


EVALUATION METHODS by PURP0SE<^» 


Purpose of Evaluation 

Value- 

Added 

Classroom 

Observation 

Artifacts 

Analysis 

Portfolio 

Self 

Report 

Student 

Survey 

Determine whether teachers' 
students are meeting growth 
expectations 

• 


• 




Provide new teachers with 
guidance 


• 

• 

• 



Evaluate teachers in non-tested 
grades and/or subjects 


• 

• 

• 


• 

Determine professional 
development needs 

• 

• 



• 


Contract renewal and 
tenure decisions 

• 

• 





Compensation and incentive 
reward decisions 

• 

• 






Source: Little, 0., Goe, L„ and Bell, C. (2009). A practical guide to evaluating teacher effectiveness. 

Chicago: National Comprehensive Center for Teacher Quality. Summary adapted with permission of the author. 09 


Q l How can states ensure that teacher evaluation strategies are 
' both fair and effective? 

A While teacher evaluation — like student evaluation — involves significant 
r \ methodological challenges and thorny policy questions, there's good guidance 


to show the way forward. Two key principles for developing fair, usable assessment 
systems are validity and reliability. 

Table 3: Key Tenets of Effective Evaluation Systems 


KEY TENETS of EFFECTIVE EVALUATION SYSTEMS 

PROPRIETY 

APPLICATION 

"Standards are intended to ensure that a personnel 
evaluation will be conducted legally, ethically, and 
with due regard for the welfare of the evaluatee 
and those involved in the evaluation." 

Is the evaluation administered in accord 
with state law and district policy? 

UTILITY 

APPLICATION 

"Standards are intended to guide evaluations 
so that they will be informative, timely, and 
influential." 

Are evaluation results shared with 
the teacher, and is he/she given an 
opportunity to understand the ratings 
and the evidence undergirding them? 

FEASIBILITY 

APPLICATION 

"Standards are intended to guide personnel 
evaluation systems so that they are as easy to 
implement as possible, efficient in their use of time 
and resources, adeguately funded, and viable from 
a political standpoint." 

Does the building principal have enough 
time to evaluate teachers? Are the 
costs of the evaluation affordable? Do 
stakeholders understand the evaluation 
system? 

ACCURACY 

APPLICATION 

"Standards determine whether an evaluation has 
produced sound information. Personnel evaluations 
must be technically adequate and as complete as 
possible to allow sound judgments and decisions 
to be made. The evaluation methodology should be 
appropriate for the purpose of the evaluation and 
the evaluatees being evaluated and the context in 
which they work." 

Is the evaluation system rigorous 
and authentic? Does it produce clear, 
defensible, and accurate measures of 
performance? 


Source: Joint Committee Standards for Educational Evaluation (1988). Retrieved from http://www.jcsee.org 
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Validity refers to the extent to which a measurement is truly representative of what 
it claims to gauge; in other words, teachers should clearly understand the criteria on 
which they're being judged, and there should be alignment between those standards 
and the evaluation. 

Reliability speaks to the fact that consistency in ratings — over time and across raters — 
is a precondition for any fair evaluation. 

The Joint Committee on Standards for Educational Evaluation's Personnel Evaluation 
Standards outlines additional key tenets for structuring reliable, rigorous measures. 24 
The four standards are summarized in Table 3, along with examples of practical 
applications for each. 


Q How are states and districts responding to calls to include 
student performance in teacher assessments? 


As states and districts redesign teacher evaluation systems, teacher ratings are 


increasingly being aligned with decisions over tenure, dismissal, and teacher pay. 
Reporting by Education Week found that nationwide, during the most recent legislative 
sessions, "eight states. . .linked evaluations to student achievement, with most 
eventually requiring that 50 percent of an evaluation score be based on student data." 25 
Table 4, which provides detail on several states — including four of Pennsylvania's 
neighbors: Delaware, Maryland, New York, and Ohio — illustrates this trend. While 
New Jersey has not enacted legislation, at least 10 districts in the state will be piloting a 
new teacher evaluation model. Excellent Educators for New Jersey (EE4NJ), during the 
2011-12 school year based on the recommendations of the state educator effectiveness 
task force released in March 2011. 


Q ls there any research evidence regarding how much student 

achievement should factor into teacher effectiveness measures? 


No. To date, there has been no research that examines the pros and cons of the 


relative weight of student achievement in evaluations of teacher effectiveness. 

As can be seen in Table 4, most states weight student achievement at or around 
50%, though the precise composition of this 50% varies across states. This weight 
can be traced to the Race to the Top competition and earlier policy goals advanced 
by the National Council on Teacher Quality that student learning should be the 
"preponderant criterion" in evaluation decisions. 26 Rigorous research on this question is 
vital as more states explore performance measures in evaluation and assign weights. 
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Table 4: Survey of State Actions 


RECENT POLICY CHANGES TO INCLUDE 
STUDENT PERFORMANCE IN GENERAL EDUCATION 
TEACHER EVALUATION SYSTEMS lESBMI g 



■■llVIllEll/IIISl 


III 


State 

What % of evaluation is tied 
to student performance? 

Is evaluation 
required annually? 

What teacher decisions 
does the new evaluation 
system impact? 

What legislation, 
regulation or program is 
the basis for this policy? 

When is the evaluation 
system to be fully 
implemented? 

COLORADO 

50% 

Yes 

© 


© 

S.B. 191 (2010) 

2013-2014 

DELAWARE 

Not Specified 27 

No 28 

© 


© 

Delaware Performance 
Appraisal System 
(DPAS II) 

2012-2013 

DISTRICT OF 
COLUMBIA 

50% 

Yes 

© 

® 

© 

IMPACT (2009) 

Introduced in 
2009-2010 

FLORIDA 29 

50% 

Yes 

© 

® 

© 

S.B. 736 (2011) 
"Student Success Act” 

2014-2015 

MARYLAND 

50% 

(30% on state assessment & 

20% on locally determined measure) 

Yes 

© 



Education Reform Act 
(2010) 

2013-2014 

NEW YORK 

40% 

(20% on state assessments & 

20% on locally determined measures) 

Yes 

© 

® 

© 

S3012-C of Education 
Law (2010) 

2012-2013 

OHIO 

50% 

Yes 

© 

® 

© 

S.B. 5 & H.B. 153 (2011) 

2013-2014 

TENNESSEE 

50% 

(35% on state student growth score & 
15% on other student achievement) 

Yes 

© 

® 

© 

S.B. 7005A (2010) 
H.B 7010A (2010) 

2011-2012 

RHODE ISLAND 

51% 

40% in 2011-12, 45% in 2012-13 
51% in 2013-14 

Yes 

© 

® 


Educator Evaluation 
System Standards 
(2009) 

2013-2014 


27 While a percentage for student achievement is not specified, a teacher cannot be evaluated as effective unless s/he has received a Satisfactory Component Rating in 
at least three (3) appraisal components including the Student Improvement Component, which includes student growth data. 

28 Experienced teachers who earn a rating of "highly effective" on their most recent summative evaluation must receive a summative evaluation at least once every 
1 2 two years. However, the "student improvement" component must be evaluated every year. 


(i)Tenure (r) Retention (^Compensation 

29 Regarding tenure, as of July 1, 2011, new teachers will work under annual contracts, instead of receiving tenure. 
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Conclusion 


Ultimately, any evaluation system involves important tradeoffs between 
rigor and utility, design and cost, and precision and subjectivity. But given 
the emerging consensus that effective evaluation is not a "satisfactory, 
unsatisfactory" proposition, there is clear room for improvement — both in 
state policy and in local practice. Because of Pennsylvania's unique traits — 
e.g., deep experience with both value-added measures and Danielson's 
Framework for Teaching, a strong labor history coupled with reform-minded 
union leadership in communities such as Pittsburgh — these efforts are likely 
to have relevance far beyond our state's borders. By designing reforms 
that build on the research base; emphasize multiple, rigorous measures; 
and promote regular, meaningful feedback on instruction, Pennsylvania's 
education policymakers can help ensure that reforms to teacher evaluation 
systems is work worth emulating. 
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